LCORE-1377 Parse index name from chunk metadata source attribute#1300
Conversation
Walkthroughrefine resolve_source_for_result to read Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/responses.py`:
- Around line 849-852: The fallback uses attributes.get("source") but then looks
up rag_id_mapping (built as vector_db_id -> rag_id), which is the wrong key
shape and will usually return the raw metadata; change the fallback to return
attr_source directly instead of rag_id_mapping.get(attr_source, attr_source),
i.e., when attributes.get("source") is present just return attr_source, or
alternatively implement and call a dedicated index-name-to-rag lookup (e.g., an
indexNameToRag map) before this helper if you need translation—update references
to rag_id_mapping, attributes.get("source"), and the fallback branch
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50c19659-1ea1-42fa-b276-7ea4b3b584f9
📒 Files selected for processing (2)
src/utils/responses.pytests/unit/utils/test_responses.py
bbccaa0 to
3c1abde
Compare
|
@are-ces Could you PTAL? |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/responses.py (1)
844-854:⚠️ Potential issue | 🟡 MinorSupport dict-shaped results in this fallback.
parse_referenced_documents()already handles both object and dict results, but this branch still pullsattributesviagetattr(...)only. In a multi-store response backed by dicts, bothvector_store_idand the newsourcefallback will be skipped andsourceresolves toNone.🩹 Proposed fix
if len(vector_store_ids) > 1: - attributes = getattr(result, "attributes", {}) or {} + if isinstance(result, dict): + attributes = result.get("attributes", {}) or {} + else: + attributes = getattr(result, "attributes", {}) or {} attr_store_id: Optional[str] = attributes.get("vector_store_id") if attr_store_id: return rag_id_mapping.get(attr_store_id, attr_store_id)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 844 - 854, The fallback assumes result is an object and uses getattr(result, "attributes", {}), which skips dict-shaped results; update the attributes extraction in parse_referenced_documents() (the block that sets attributes, attr_store_id, attr_source and uses rag_id_mapping) to first handle dictionaries (e.g., if isinstance(result, dict): attributes = result.get("attributes", {}) ) and otherwise use getattr, then proceed to check attr_store_id and attr_source as before so both "vector_store_id" and the "source" fallback work for dict and object results.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/utils/responses.py`:
- Around line 844-854: The fallback assumes result is an object and uses
getattr(result, "attributes", {}), which skips dict-shaped results; update the
attributes extraction in parse_referenced_documents() (the block that sets
attributes, attr_store_id, attr_source and uses rag_id_mapping) to first handle
dictionaries (e.g., if isinstance(result, dict): attributes =
result.get("attributes", {}) ) and otherwise use getattr, then proceed to check
attr_store_id and attr_source as before so both "vector_store_id" and the
"source" fallback work for dict and object results.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6c64e5a7-232b-43e1-ae13-1e82277a704a
📒 Files selected for processing (2)
src/utils/responses.pytests/unit/utils/test_responses.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/utils/test_responses.py
|
Thanks @tisnik , could you also PTAL at the rag-content counterpart? |
are-ces
left a comment
There was a problem hiding this comment.
Just a nit, otherwise LGTM
3c1abde to
5fb28b9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/utils/responses.py`:
- Around line 844-850: The branch in parse_referenced_documents() uses
getattr(result, "attributes", {}) which misses dict-shaped results; change
attribute extraction to handle both objects and dicts (e.g., if
isinstance(result, dict): attributes = result.get("attributes", {}) else:
attributes = getattr(result, "attributes", {}) or {}), and when resolving
attr_source use attributes.get("source") with a fallback to result.get("source")
for dict-backed results so dict results can expose the embedded "source" the
same way object results do.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 57799d26-976e-41b2-9344-c2472d3afe91
📒 Files selected for processing (2)
src/utils/responses.pytests/unit/utils/test_responses.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/utils/test_responses.py
| attributes = getattr(result, "attributes", {}) or {} | ||
|
|
||
| # Primary: read index name embedded directly by rag-content. | ||
| # This value is already the user-facing rag_id, not a vector_db_id, | ||
| # so no mapping is needed. | ||
| attr_source: Optional[str] = attributes.get("source") | ||
| if attr_source: |
There was a problem hiding this comment.
Handle dict-backed results before reading source.
parse_referenced_documents() already supports both object and dict-shaped results, but this branch only uses getattr. When result is a dict, Line 849 never sees the embedded source, so multi-store source resolution falls back to None for referenced documents.
Suggested fix
- attributes = getattr(result, "attributes", {}) or {}
+ raw_attributes = (
+ result.get("attributes", {})
+ if isinstance(result, Mapping)
+ else getattr(result, "attributes", {})
+ )
+ attributes = raw_attributes if isinstance(raw_attributes, Mapping) else {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/responses.py` around lines 844 - 850, The branch in
parse_referenced_documents() uses getattr(result, "attributes", {}) which misses
dict-shaped results; change attribute extraction to handle both objects and
dicts (e.g., if isinstance(result, dict): attributes = result.get("attributes",
{}) else: attributes = getattr(result, "attributes", {}) or {}), and when
resolving attr_source use attributes.get("source") with a fallback to
result.get("source") for dict-backed results so dict results can expose the
embedded "source" the same way object results do.
Description
Resolve the embedded index name in the chunk metadata "source" attribute to figure out which store a chunk came from. This is done after single-store ID mapping and vector_store_id attribute resolution have been tried.
The rag-content counterpart PR is lightspeed-core/rag-content#99
(Continuation of #1135, #1208, #1248)
Type of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
metadata: {"source": "test-index", ...}throughvector_io.insertvector_stores/{id}/search.result.attributes["source"]=="test-index".Summary by CodeRabbit
Bug Fixes
Tests